Finding Common Origins of Milky Way Stars

Author

Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

Published

May 30, 2025

Exploratory Data Analysis

Show Code
# get X (using mean-imputed) and metadata for EDA
metadata <- metadata$train
feature_modes <- list("small" = 7, "medium" = 11, "big" = 19)
train_data_ls <- purrr::map(
  feature_modes,
  ~ get_abundance_data(data_mean_imputed$train, feature_mode = .x)
)
features_ls <- purrr::map(train_data_ls, ~ colnames(.x))
X <- train_data_ls$big

Feature Distributions

Show Code
# plot overall distribution
ggwrappers::plot_histogram(X) +
  ggplot2::facet_wrap(~ variable, scales = "free_x") +
  ggplot2::labs(x = "Data")

Distribution of abundance values per feature in (mean-imputed) training data.

Distribution of abundance values per feature in (mean-imputed) training data.
Show Code
# plot boxplots per GC
plt_df <- dplyr::bind_cols(
  X,
  metadata |> dplyr::select(GC_NAME)
) |> 
  dplyr::group_by(GC_NAME) |> 
  dplyr::mutate(
    GC_NAME = sprintf("%s (n = %d)", GC_NAME, dplyr::n())
  ) |> 
  dplyr::ungroup()
plt_vars <- sort(colnames(X))
plt_ls <- list()
for (plt_var in plt_vars) {
  plt_ls[[plt_var]] <- plt_df |> 
    ggplot2::ggplot() +
    ggplot2::aes(
      x = reorder(GC_NAME, !!rlang::sym(plt_var)), 
      y = !!rlang::sym(plt_var),
      fill = GC_NAME
    ) +
    ggplot2::geom_boxplot() +
    ggplot2::labs(x = "GC Name") +
    vthemes::theme_vmodern() +
    ggplot2::theme(
      axis.text.x = ggplot2::element_text(
        angle = 90, hjust = 1, vjust = 0.5
      ),
      legend.position = "none"
    )
}
plt <- patchwork::wrap_plots(plt_ls, ncol = 2) +
  patchwork::plot_layout(axis_titles = "collect")
subchunkify(
  plt, fig_height = 30, fig_width = 10, 
  caption = "'Distribution of abundance values per feature and GC in (mean-imputed) training data.'"
)

Distribution of abundance values per feature and GC in (mean-imputed) training data.

Distribution of abundance values per feature and GC in (mean-imputed) training data.

Galactic Coordinate Plots

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by GC_NAME.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by GC_NAME.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by AL_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by AL_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by C_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by C_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CA_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CA_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CO_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CO_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CR_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CR_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by FE_H.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by FE_H.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by K_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by K_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by MG_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by MG_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by MN_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by MN_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by N_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by N_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by NA_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by NA_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by NI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by NI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by O_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by O_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by S_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by S_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by SI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by SI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by TI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by TI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by TIII_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by TIII_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by V_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by V_FE.

Pair Plots